Method of searching novel ligand compounds from three-dimensional structure database

ABSTRACT

A method of searching one or more ligand-candidate compounds to a target biopolymer from a three-dimensional structure database, which comprises the steps of: (i) the first step of assigning hydrogen-bonding category numbers, information for calculating force-field energy, and information for generating conformations to two or more trial compounds in addition to atomic three-dimensional coordinates thereof; (ii) the second step of preparing physicochemical information about a ligand-binding region and one or more dummy atoms based on the three-dimensional atomic coordinates of the target biopolymer;(iii) the third step of estimating the most stable docking structure, wherein said step further comprises the steps of examining possible docking structures by docking a trial compound to the biopolymer while varying conformations of the trial compound, evaluating interaction energies between the target biopolymer and the trial compound, and repeating structure optimization; (iv) the fourth step of deciding whether or not the trial compound should be adopted as a ligand-candidate compound based on given criteria including the interaction energy values between the target biopolymer and the trial compound in the most stable docking structure estimated according to the step (iii); and (v) the fifth step of repeating the step (iii) and the step (iv) for all of the trial compounds.

TECHNICAL FIELD

The present invention relates to a method of searching athree-dimensional structure databases, which can be utilized for findingnovel lead-structures of medicaments, pesticides, and other biologicallyactive compounds.

BACKGROUND ART

Medicaments generally exhibit their biological activities through stronginteractions with target biopolymers. In recent years, three-dimensionalstructures no- of biopolymers, which play important roles in livingbodies, have been revealed successively on the basis of progresses inX-ray crystallographic analyses and NMR technology. With the advance ofthese researches, methodologies of theoretical approach based oninformation about three-dimensional structures of biopolymers and theirsuccessful results have been reported with respect to lead generationsof new drugs, which were conventionally achieved mostly by randomscreenings or accidental discoveries. Importance of such approach isrecognized with increasing interest, and researches from variousviewpoints have been conducted in all the world.

An object of these researches is to provide a process for creation ofligand candidate structures by means of a computer. Another object is toprovide a process for searching for ligand structures from athree-dimensional structure database containing known compounds. Theformer (automatic structure construction approach) is advantageous inthat the process can suggest a wide variety of possible ligandstructures irrespective of known or unknown structures. On the otherhand, the latter (database approach) is used in searching for novelbiologically active compounds from databases comprised of accumulatedstructures of known compounds. When a search is applied to a databasewhich consists of compounds stocked in an own company or commerciallyavailable compounds, this method is particularly advantageous in thatcompounds found to reach criteria of hit (“hit compounds”) can beobtained without any synthetic efforts, and their association constantsto target biopolymers and biological activities can be measured.

Synthesis of a compound with a novel molecular skeleton generally needsseveral months even for an expert, and accordingly, years of and burdenof work are required to synthesize a lot of promising structures andmeasuring their activities. However, if hit compounds are alreadyavailable, measurements of association constants and biologicalactivities can be easily performed for dozens of promising ligandcandidate compounds. In addition, based on the structure of a compoundfound to have a considerable extent of high association constant oractivity, an efficient lead generation is achievable by designingcompounds with higher association constants and biological activitiesand synthesizing the compounds. For these reasons, approaches haverecently been interested which comprises the steps of searchingthree-dimensional structure databases consisting of known compounds anddiscovering novel pharmacologically active compounds.

However, one of problems of this method is by what query the search iscarried out. Generally, a three-dimensional structure database issearched by queries of whether or not given atomic groups or functionalgroups exist which are presumed as essential for the desired activity intypical biologically active compounds used as templates, oralternatively, whether or not their relative positions are similar tothose of the template compounds. Where the structure of a targetbiopolymer is unknown, basically no approach can be relied on other thanthis process. However, since these queries are based on assumptions andhypotheses, hit compounds may often fail to have the same activity asthat of the template compounds. Where two or more molecules with quitedifferent structural features exhibit similar activities, it can onlybecome possible to create more probable operating hypotheses, withreference to functional groups essential for the activity and theirrelative positions, by means of structural information about thesemolecules.

The most difficult problem in the search of three-dimensional structuredatabases lies in the handling of conformational flexibility ofcompounds. It has been known that the conformations of a ligand as itbinds to a biopolymer (i.e., the most stable complex is formed) do notnecessarily accord with any of stable structures of the molecule itself,in the state of a crystal or a solution, or a structure with the moststable energy obtained by energy calculation, and that one ligandmolecule can form stable complexes in different conformations dependingon target biopolymers. Generally, a set of atomic coordinatesrepresenting a three-dimensional structure of one conformation amongmultiple possible conformations is contained as for one compound in adatabase. Furthermore, except for databases derived from crystalstructures, information of three-dimensional structure of each compoundis obtained from two-dimensionally inputted structure throughcalculation. These three-dimensional structures often represent one oflocal minimum structures that can be taken by the molecule, per se.

Therefore, if a search is carried out merely on the basis ofconformations contained in a database to judge whether or not compoundsmeet the criteria of hit, most compounds fail to be selected whichshould be hit if other conformations are considered. Although the numberof conformations to be considered may vary depending on the number ofrotatable bonds, as well as factors such as degrees of preciseconsideration of conformations, several tens to hundreds of thousandsconformations should be taken into account for a moderately flexiblemolecule containing 3 to 6 rotatable bonds. In order to consider thesepossible conformations, available processes are limited to either ofthose comprising the steps of selecting promising conformations andinputting them in a database beforehand, or alternatively, generatingconformations and examining at the time of conducting a search. In anyevents, enormous computer resources and calculation time are required.

Recently, Kearsley et al. with Merck prepared a database which comprises20 conformations at most for a single compound and reported “FLOG,” asearching system in which a search is carried out using searchingstandard mainly consisting of relative positions of functional groups(M. D. Miller, S. K. Kearsley, D. J. Underwood, and R. P. Sheridan:Journal of Computer-Aided Molecular Design, 8, 1994, pp.153-174, FLOG: Asystem to select ‘quasi-flexible’ ligands complementary to a receptor ofknown three-dimensional structure). It was reported that the FLOG tookapproximately one week to complete searching a database containing2,000,000 conformations for 100,000 compounds by means of a supercomputer CRAY. The twenty conformations in average for each compound arestored by selecting energetically stable local minimum structuresthrough prior conformational analysis of each compound, for whichenormous time and burden of work are needed. Nevertheless, twentyconformations per a single compound are insufficient.

In addition, the method adopts the query which comprise the presence orabsence of functional groups presumably essential for the activity of acompound as a template, as well as distance, angle, or direction betweenthe groups. These queries are most plain among possible queries, andalthough they have advantages to shorten calculation time using simplealgorithm, high probability cannot be expected that hit compoundsactually have the desired activity. The reason lies in that a moleculehaving inappropriate whole molecular shape and size fails to exhibitactivity, even if functional groups merely have desired relativepositions. If a three-dimensional structure of the target biopolymer isknown, the most effective search can be achieved by means of suchinformation, which provides high probability that hit compounds have thedesired activity.

For example, Eyermann et al. with Dupont Merck performed a databasesearch based on relative positions of functional groups of ligandmolecules which had been analyzed by X-ray crystallography as a complexwith a biopolymer (P. Y. S. Lam, P. K. Jadhav, C. J. Eyemann, C. N.Hodge, Y. Ru, L. T. Bacheler, J. L. Meek, M. J. Otto, M. M. Rayner, Y.N. Wong, C. H. Chang, P. C. Weber, D. A. Jackson, T. R. Sharp, and S.Erickson-Viitanen: Science, 263, 1994, pp.380-384, Rational design ofPotent, Bioavailable, Nonpeptide Cyclic Ureas as HIV ProteaseInhibitors.). In the HIV protease system, compounds were searched for asto criteria where the three positions of an oxygen atoms in watermolecules are in similar relative positions which connect the proteaseto the ligand through hydrogen bonds at the centers of two benzene ringsin the peptide ligand molecule. Among the hit compounds, one compoundwas selected which well fitted into the ligand-binding region and formedhydrogen bonds other than those formed by the oxygen atoms used asquery. They reported that a compound with high activity was developedbased on the structure of this compound by repeated works ofsynthesizing compounds with appropriate modifications while measuringinhibitory activities against the enzyme. However, the publication lacksdetailed descriptions as for the search process, and their technique ofhandling conformations is not deducible.

The search query capable of providing the highest probability that hitcompounds have desired activity is that whether or not compounds stablybind to a ligand-binding region of the target biopolymer. Although insome cases the desired activity fails to be exhibited due tophysicochemical factors such as water solubility even if a compoundsatisfies the criteria, reaching the criteria is an essentialrequirement for the expression of activity. On the other hand, thepresence or absence of specific functional groups and the relativepositions of functional groups are not necessarily essential for theexpression of activity, and even if a compound falls within a differentfamily and has dissimilar skeletal structure to the known activestructures, the compound may possibly exhibit the same activity if itcan form a complex between the target biopolymer at a similar extent ofstability. Therefore, compounds satisfying the criteria that they canstably bind to a ligand-binding region of the target biopolymer havehigh probability of being actual candidates as ligands. In other words,by using the criteria of fitting to a ligand-binding region of thetarget biopolymer, it becomes possible to identify compounds whichexhibit the desired biological activity and have a wide variety ofstructures from databases.

In order to determine whether or not a given ligand molecule can bind tothe target biopolymer, it is necessary to find the most stable dockingstructure among possible docking structures and to know the degree ofstability of the docking structure. Also, in order to find the moststable docking structure between a given biopolymer and a ligandmolecule, it is necessary to estimate stabilities of all dockingstructures by considering all possible binding modes (corresponding torotation or translation of one molecule while fixing another compound),and possible ligand conformations. However, since the process needsenormous calculation, it cannot be achieved by interactive methods usinga graphic display. Therefore, the development of an automatic dockingmethod has been desired which can perform the above processautomatically and efficiently.

As automatic docking methods, Kuntz et al. developed a method forestimating possible binding modes by representing the shape of aligand-binding region by means of several to dozens of inscribed spheresand comparing a set of vectors formed by the centers of the spheres witha set of intramolecular vectors of a ligand compound (R. L. Desjarlais,R. P. Sheridan, G. L. Seibel, J. S. Dixon, I. D. Kuntz and R.Venkataraghavan: J. Med. Chem. 31 (1989) 722-729). Using ShapeComplementarity as an Initial Screen in Designing Ligands for aReceptor-Binding Site of Known 3-Dimensional Structure.). In a recentimproved process, the process is modified so that intermolecular energycan be calculated in addition to the score representing coincidencebetween the vectors.

However, this method is disadvantageous because it needs an undue timeeven only for covering all binding modes. Some improved processesinclude alternations so that conformations can be varied for the dockingof a single compound. However, the processes are not designed so as toprovide an automatic procedure covering all of the docking processes.Although energy evaluation can be done concerning hit compounds, onlysimple scores can be calculated and hydrogen bonds or the like aredisregarded during the docking process. Furthermore, its accuracy isinsufficient, i.e., there is a problem in that, for example, the truedocking structure determined by crystal analysis fails to be judged asthe highest rank and is sometimes judged as rather low in rank.Moreover, its unavoidable defect is in that the process is carried outfundamentally by a fixed conformation for each compound, andconformational flexibilities cannot be handled. In a manual operationcannot compensate for the above defect especially, its practical utilityis low in a three-dimensional database search which requires fullautomatic operation for numerous compounds.

The inventors of the present invention developed “ADAM,” a method forestimating the most stable docking structures of a biopolymer and aligand molecule automatically and without any preconception, and theysuccessfully solved all of the aforementioned problems (PCT/JP93/0365,PCT International Publication WO 93/20525; M. Yamada and A. Itai, Chem.Pharm. Bull., 41, p.1200, 1993; M. Yamada and A. Itai, Chem. Pharm.Bull., 41, p.1203, 1993; and M. Yamada Mizutani, N. Tomioka and A. Itai,J. Mol. Biol., 243, pp.310-326, 1994).

In the applications of “ADAM” to several enzyme-inhibitor systems whosestructures had been elucidated by crystallographic analyses, all theresulted docking models with minimum energy perfectly reproduced thestructure in the crystal, and torsion angles of rotatable bonds in theligand molecule and hydrogen-bonding schemes were also accuratelyreproduced. The high accuracy of “ADAM” is based on the calculations ofstructure optimization (energy minimization) performed altogether threetimes. Though the time required may vary depending on the performance ofa computer, numbers of dummy atoms, number of heteroatoms, number ofrotatable bonds and the like and cannot be generalized, it takes fromseveral minutes through approximately 1 hour to obtain an initialdocking model with an ordinary drug molecule by means of a commonly usedworkstation (R4400). This speed is dozens of times faster than thefastest method which handles possible conformations reported so far.

However, although ADAM is suitable in searching for the most stabledocking structure between the target biopolymer and a single ligandmolecule, the method is not suitable for searching novel ligandcompounds from a wide variety of numerous compounds contained in athree-dimensional structure database as it is. Therefore, furtherimprovement has been desired.

Accordingly, an object of the present invention is to provide a methodsuitable for searching novel ligand compounds from a wide variety ofnumerous compounds, thereby solving the problems of the state of theart.

DESCRIPTION OF THE INVENTION

The inventors of the present invention performed various researches. Asa result, they succeeded in developing a process for selecting one ormore promising ligand candidates from an enormous number of trialcompounds: wherein the process comprises the steps of estimating themost stable docking structure for each trial compound; and selecting oneor more ligand candidate compounds from all the trial compounds by givencriteria including the interaction energy between the target biopolymerand the trial compound in the most stable docking structures; and theaforementioned step of estimating the most stable docking structure foreach trial compound comprises the steps of: evaluating all possibledocking structures generated through docking of the trial compound tothe biopolymer, while varying conformation of the trial compound andrepeating structural optimizations, based on interaction energiesbetween the biopolymer and the trial compound, e.g., hydrogen bonds,electrostatic interaction, and van der Waals force. The presentinvention was achieved on the basis of these findings.

The present invention thus provides a method of searching one or moreligand compound to a target biopolymer from a three-dimensionalstructure database, which comprises the steps of:

(i) the first step of assigning hydrogen-bonding category numbers,information for calculating force-field energies, and information forgenerating conformations, to two or more trial compounds, in addition tothree-dimensional atomic coordinates thereof;

(ii) the second step of preparing physicochemical information about aligand-binding region and one or more dummy atoms based on thethree-dimensional atomic coordinates of the target biopolymer;

(iii) the third step of estimating the most stable docking structure,wherein said step further comprises the steps of generating possibledocking structures by docking a trial compound to the biopolymer whilevarying conformations of the trial compound, and evaluating interactionenergy between the target biopolymer and the trial compound, based onthe hydrogen-bonding category number, the information for calculatingforce-field energies, and the information for generating conformationsassigned to the trial compound in addition to the three-dimensionalatomic coordinates according to the step (i) and based on thephysicochemical information about the ligand-binding region prepared inthe step (ii):and then;

(iv) the fourth step of deciding whether or not the trial compoundshould be adopted as a ligand-candidate compound based on given criteriaincluding the interaction energy values between the target biopolymerand the trial compound in the most stable docking structure foundaccording to the step (iii); and

(v) the fifth step of repeating the step (iii) and the step (iv) for allof the trial compounds.

The method of the present invention may comprise the sixth step offurther selecting the ligand-candidate compounds selected in the step(iv) based on different criteria including the number of hydrogen bondsformed between the target biopolymer and/or interaction energy valuesderived therefrom.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows a conceptual scheme by featuring the method of the presentinvention.

FIG. 2 is a flowchart according to an embodiment of the method of thepresent invention. In the figure, S represents each step.

FIG. 3 shows examples of the substrate and known inhibitor analogues fora dihydrofolate reductase selected as ligand candidate compounds by themethod of the present invention.

FIG. 4 shows examples of ligand-candidate compounds with novelstructures for a dihydrofolate reductase obtained by the method of thepresent invention.

BEST MODE FOR CARRYING OUT THE INVENTION

In the method of the present invention, the trial compounds are notparticularly limited so long as their structures are known. For example,various compounds contained in currently available databases can beused. The hydrogen-bonding category numbers are identification numbersfor functional groups capable of forming hydrogen bonds, and assigned toheteroatoms that directly participate in hydrogen bonds by thefunctional groups. By referring to the number, a geometric structure ofthe functional group and the nature of the hydrogen bond as well as theposition of partner hydrogen-bonding atoms (dummy atoms) are instantlygenerated. The information for calculating force field energies includesan atom type number to discriminate atomic element and the state ofhybridization of each atom that are used for calculating intramolecularand intermolecular interaction energies by means of molecular forcefield. The information includes an atomic charge assigned to each atom.The information for preparing conformations is used for systematicallygenerating different conformations by varying torsion angles ofrotatable bonds, in the range of the given initial and final values ofthe torsion angles by given values of step angle. It includes, for onebond to be rotated, a set of four atomic numbers constituting thetorsion angle, and the initial and final values of the torsion angle anda step angle of rotation.

The term “biopolymer” means biological macromolecules and includesartificial mimetic molecules based on biological macromolecules. Theterm “physicochemical information about the ligand-binding region” meanspotential influence brought by all of the atoms of the biopolymer insidea cavity space of the biopolymer to which a ligand can bind. The meaningincludes hydrogen-bonding properties of three-dimensional grid pointswithin the ligand-binding region (hydrogen bond acceptor or hydrogenbond donor), and van der Waals interaction energies and electrostaticinteraction energies generated between the biopolymer and probe atomswhen the probe atoms are placed on the three-dimensional grid points.The meaning of “docking” is used to include the step of forming acomplex from a trial compound molecule and a biopolymer, including thestep of finding a stable docking structure formed by the both of them.The docking can generally be performed by interactive methods usingcomputer graphics, as well as by automatic docking methods as the casemay be.

The term “interaction between a biopolymer and a trial compound” is aforce arisen between the biopolymer and the trial compound, whoseexamples include hydrogen bond, electrostatic interaction, van der Waalsinteraction and the like. The term “ligand” or “ligand compound” is acompound with low molecular weight that binds to a biopolymer, whoseexamples include medicaments, substrates and inhibitors of enzymes,coenzymes, and other biologically-active substances.

According to an embodiment of the method of the present invention, theabove third step may comprise the following steps:

(1) step (a) of searching likely binding modes between the biopolymerand the trial compound by preparing all of the combinatorialcorrespondences between hydrogen-bonding heteroatoms in the trialcompound and dummy atoms placed at the positions of heteroatoms that canbe hydrogen-bonding partners of hydrogen-bonding functional groups inthe biopolymer;

(2) step (b) of simultaneously estimating possible hydrogen-bondingschemes between the biopolymer and the trial compound and conformationsof hydrogen-bonding part of the trial compound by comparing thedistances between dummy atoms with the distances between thecorresponded hydrogen-bonding heteroatoms while systematically changingthe conformations of the trial compound; and

(3) step (c) of obtaining the structure of a docking structurecomprising the biopolymer and the trial compound by converting theatomic coordinates of the trial compound to the coordinate system of thebiopolymer for each hydrogen-bonding scheme and conformation obtained inthe step (b) based on the correspondences between the hydrogen-bondingheteroatoms in the trial compound and the dummy atoms.

In addition, according to another embodiment of the method of thepresent invention, the above third step may comprise the followingsteps:

(1) step (a) of searching likely binding modes between the biopolymerand the trial compound by preparing all of the combinatorialcorrespondences between hydrogen-bonding heteroatoms in a partialstructure of the trial compound and the dummy atoms placed at thepositions of heteroatoms that can be hydrogen-bonding partners ofhydrogen-bonding functional groups in the biopolymer;

(2) step (b) of simultaneously estimating hydrogen-bonding schemesbetween the biopolymer and the trial compound and conformations of thepartial structure of the trial compound by comparing the distancesbetween dummy atoms with the distances between the correspondedhydrogen-bonding heteroatoms while systematically changing theconformations of the trial compound;

(3) step (c) of preserving correspondences of the dummy atoms and thehydrogen-bonding heteroatoms that provide impossible hydrogen-bondingschemes in the partial structure of the trial compound, andhydrogen-bonding heteroatoms that cannot form any hydrogen-bond with thedummy atoms based on the hydrogen-bonding schemes and conformationsobtained in the step (b);

(4) step (d) of searching likely binding modes between the biopolymerand the trial compound by preparing all of the combinatorialcorrespondences between dummy atoms and hydrogen-bonding heteroatoms ina the whole structure of the trial compound, excluding thecorrespondences containing the hydrogen-bonding heteroatoms preserved inthe step (c) and the correspondences containing the dummy atoms andhydrogen-bonding heteroatoms preserved in the step (c);

(5) step (e) of simultaneously estimating possible hydrogen-bondingschemes between the biopolymer and the trial compound and conformationsof hydrogen-bonding part of the trial compound by comparing thedistances between dummy atoms with the distances between correspondedhydrogen-bonding heteroatoms while systematically changing theconformations of the trial compound; and

(6) step (f) of obtaining the structure of a docking structurecomprising the biopolymer and the trial compound by converting theatomic coordinates of the trial compound to the coordinate system of thebiopolymer for each hydrogen-bonding scheme and conformations obtainedin the step (e) based on the correspondences between thehydrogen-bonding heteroatoms in the trial compound and the dummy atoms.

When the third step includes the above sub-steps, it becomes possible toaccelerate the search for stable docking structures consisting of thebiopolymer and the trial compound, and in addition, it also becomepossible to search for stable docking structures including the moststable docking structure in short time even where the biopolymer and/orthe trial compounds have complicated structures.

According to a further embodiment of the method of the presentinvention, the above third step may comprise the following steps:

(1) step (a) of searching likely binding modes between the biopolymerand the trial compound by preparing all of the combinatorialcorrespondences between hydrogen-bonding heteroatoms in the trialcompound and dummy atoms placed at the positions of heteroatoms that canbe hydrogen-bonding partners of hydrogen-bonding functional groups inthe biopolymer;

(2) step (b) of simultaneously estimating possible hydrogen-bondingschemes between the biopolymer and the trial compound and conformationsof hydrogen-bonding part of the trial compound by comparing thedistances between dummy atoms with the distances betweenhydrogen-bonding heteroatoms while systematically changing theconformations of the trial compound;

(3) step (c) of optimizing the conformations of the trial compound sothat the positions of the dummy atoms and the hydrogen-bondingheteroatoms in the trial compound can be in accord with each other whileretaining the hydrogen-bonding scheme obtained in the step (b), and thenexcluding the conformations of the trial compound having highintramolecular energies;

(4) step (d) of obtaining the structure of a complex comprising thebiopolymer and the trial compound by converting the entire atomiccoordinates of the trial compound to the coordinate system of thebiopolymer for each conformation not excluded in the step (c) based onthe corresponding relationships between the hydrogen-bonding heteroatomsin the trial compound and the dummy atoms;

(5) step (e) of excluding from docking structures of hydrogen-bondingparts obtained in the step (d) the docking structures in whichintramolecular energies of the hydrogen-bonding parts of the trialcompound and intermolecular energies between the biopolymer and thehydrogen-bonding parts of the trial compound are high, and then carryingout structure optimizations of the remaining docking structures;

(6) step (f) of obtaining new docking structures comprising the wholetrial compound by generating the conformations of non-hydrogen-bondingparts of the trial compound for each docking structure obtained in thestep (e); and

(7) step (g) of excluding from the docking structures obtained in thestep (f) the docking structures in which intramolecular energies of thewhole structure of the trial compound and intermolecular energiesbetween the biopolymer and the trial compound are high, and thencarrying out structure optimizations of the remaining dockingstructures.

When the third step includes the aforementioned sub-steps, it becomespossible to select appropriate docking structures even when reducednumbers of conformations are initially generated, and thereby obtainstable docking structures with high accuracy.

In the above methods, the hydrogen-bonding functional groups includefunctional groups or atoms that can participate in the formations ofhydrogen bonds. The term “hydrogen-bonding heteroatoms” meansheteroatoms which exist in the hydrogen-bonding functional groupspresent in the trial compounds. The term “hydrogen-bonding part” means apartial structure of the trial compound structure which comprises allthe hydrogen-bonding heteroatoms that are corresponded to dummy atoms,and the term “non-hydrogen-bonding part” means a partial structure otherthan the hydrogen-bonding part.

According to a still further embodiment of the present invention, thereis provided a method for preparing a three-dimensional structuredatabase which comprises the step of optionally adding three-dimensionalcoordinates of hydrogen atoms, and adding atom type numbers,hydrogen-bonding category numbers, atomic charge, and modes of bondrotation in addition to three-dimensional coordinates of trialcompounds.

Preferred embodiments of the present invention will be explained belowby referring to the flowchart shown in FIG. 2. In FIG. 2, the symbol “S”represents each step and the name “ADAM” represents the method forsearching a stable docking structure of biopolymer and ligand moleculesdisclosed in International Publication WO93/20525 (InternationalPublication Date: Oct. 14, 1993). The disclosure of the specificationand claims of the International Publication WO93/20525 is hereinincorporated by reference.

The “ADAM” format three-dimensional database is first prepared whichcomprises at least two trial compounds. The aforementioned database canbe prepared, for example, by the method explained below.Three-dimensional coordinates of trial compounds are first inputted. Asthe three-dimensional coordinates of the trial compounds, coordinatesobtained by converting two-dimensional coordinates obtainable from atwo-dimensional database to a three-dimensional coordinates throughforce-field calculation, or coordinates obtained from structuralanalysis of a single crystal as well as those from crystal database, orthose from three-dimensional structures obtained by model buildingprocedure based on energy calculation may be used. As regardsthree-dimensional structures of each trial compound, conformations arebasically not necessary to be taken into account so long as geometricvalues such as bond distances and bond angles between the atoms areaccurate. However, in order to calculate atomic charge of each atom inthe trial compound as accurately as possible, it is preferred to inputatomic coordinates of a relatively stable structure.

After inputting the above three-dimensional coordinates of the trialcompounds, a three-dimensional database can be prepared automatically byadding hydrogen atoms if necessary, and assigning a force-fieldatom-type number to each atom, hydrogen-bonding category numbers tohydrogen-bonding heteroatoms, atomic charge to each atom, and modes ofbond rotations (for example, information about covalent bonds to berotated, values of initial, final, and increment torsion angles of therotation).

The force-field atom-type numbers for each atom in the trial compoundcan be assigned, for example, according to the numbering method ofWeiner et al. (Weiner, S. J., P. A., Case, D. A., Singh, U. C., Ghio.C., Alagona, G., Profeta, S., Jr., and Weiner, P., J. Am. Chem. Soc.,106, 1984, pp.765-784). The hydrogen-bonding category numbers forhydrogen-bonding heteroatoms in the trial compound can be assignedaccording to the following Table 1. The atomic charge of each atom inthe trial compound can be calculated, for example, by the Gasteigermethod or by molecular orbital calculation with MNDO, AM1 or othermethod in the MOPAC program. As for the mode of bond rotation, it ispreferred to indicate to rotate all rotatable bonds systematically withrotation-angle intervals of 60° to 120°.

TABLE 1 A List of Hydrogen-Bonding Hereroatoms and theirHydrogen-Bonding Category Numbers included in Hydrogen-BondingFunctional Groups Hydrogen-Bonding Heteroatoms included in CategoryNumber Hydrogen-Bonding Functional Groups  1 sp² N in Primary Amines  2sp³ N in Primary Amines  3 sp³ N in Ammonium Ions  4 sp² N in Amides  5sp³ N in Secondary Amines  6 N in Aromatic Groups  7 Protonated N inAromatic Groups  8 N in Tertiary Amines  9 Protonated N in TertiaryAmines 10 O in Hydroxyls with Rotatable C-O bond 11 O in Ethers 12 O inCarbonyls 13 O in Carboxylate Anions 14 O in Carboxylic Acids 15 O inPhosphates 16 O in Water Molecules 17 S in Mercaptos 18 S in Thioethers19 O in Hydroxyls with fixed hydrogen atom positions

Then, the number of atoms in the biopolymer and their atomic coordinates(except for atomic coordinates of hydrogen atoms) are inputted (S2). Asregards the atomic coordinates of the biopolymer, three-dimensionalinformation obtained by X-ray crystallographic analyses or NMR analyses,information obtainable from the protein crystal structure database andthe like, or alternatively, atomic coordinates of a biopolymer modelconstructed on the basis of such information or the like may be used.The atomic coordinates of the biopolymer are preferably provided as athree-dimensional coordinate system. In addition, cofactors that bind tothe biopolymer and that structurally as well as functionally playimportant roles may be considered as a part of the biopolymer, and theatomic coordinates of the biopolymer in the state of being bound withthe cofactor may be inputted to perform the successive steps. Examplesof the cofactors include coenzymes, water molecules, metal ions and thelike.

Following the above step, the atomic coordinates of the hydrogen atomsin amino acid residues present in the biopolymer are calculated (S3). Ingeneral, it is difficult to determine the positions of hydrogen atoms inbiopolymers by experimental procedures such as X-ray crystallographicanalyses or NMR analyses. In addition, the information about hydrogenatoms is not available from protein crystal structure databases or thelike. Therefore, it is necessary to calculate the atomic coordinates ofthe hydrogen atoms in amino acid residues on the basis of the structuresof the amino acid residues present in the biopolymer. For hydrogen atomswhose atomic coordinates cannot be uniquely determined as they bind torotatable group of atoms in the amino acid residues, their atomiccoordinates are preferably calculated on the assumption that they existat trans positions.

Subsequently, atomic charges are appended to the atoms in amino acidresidues present in the biopolymer (S4), and then hydrogen-bondingcategory numbers are assigned to the heteroatoms in hydrogen-bondingfunctional groups present in the biopolymer (S5). As the values of theatomic charges, reported values calculated for each amino acid, forexample, values reported by Weiner et al. can be used (Weiner, S. J.,Koliman, P. A., Case, D. A., Singh, U. C., Ghio, C., Alagona, G.,Profeta, S., Jr. and Weiner, P., J. Am. Chem. Soc., 106, 1984, pp.765-784). The hydrogen-bonding category numbers can be assignedaccording to the Table 1.

A ligand-binding region is then specified (S6). As the ligand-bindingregion, spaces containing any part of the biopolymer can be specified byusing preferably a rectangular box. Depending on the purposes, aligand-binding pocket and its surrounding region of the biopolymer canbe specified. If desired, a region containing any site in the biopolymerwhich binds a different molecule such as an effector and the like canalso be specified. The term “ligand-binding region” means a cavity onthe concaved surface of the biopolymer to which a ligand molecule suchas a substrate or an inhibitor bind. For assigning the range of theligand-binding region, a part of the functions of the program “GREEN”can be used (Tomioka, N., Itai, A. and Iitaka, Y., Journal ofComputer-Aided Molecular Design, vol. 1, 1987, pp.197-210).

Three-dimensional grid points are generated inside the ligand-bindingregion specified in the step S6, and for each three-dimensional gridpoint, an address number is assigned and grid point information iscalculated (S7). The term “three-dimensional grid points” means thepoints on the three-dimensional lattice generated at a certain regularintervals within the ligand-binding region in the biopolymer. Themeaning of “grid point information” includes local physicochemicalcharacter at each grid point in the ligand-binding region such as vander Waals interaction energies and electrostatic interaction energiesarisen between the biopolymer and probe atoms, as well ashydrogen-binding properties. Wherein the van der Waals and electrostaticinteraction energies are calculated on the assumption that probe atomsexist on each three-dimensional grid point. By using the grid pointinformation on the three-dimensional grid points, it becomes possible toaccelerate approximate calculations of the intermolecular interactionsbetween the biopolymer and the trial compound to be carried out in thesuccessive steps. In addition, reasonable determination of the positionsof the dummy atoms, which is provided in the following steps, becomesachievable. As a result, possible docking structures comprising thebiopolymer and the trial compound can be entirely searched in a shorttime.

The three-dimensional grid points may be generated at intervals of0.3-2.0 angstroms, preferably 0.3-0.5 angstroms in the region specifiedin the step S6. As the probe atoms, all sorts of atoms contained inpossible ligand compounds are preferably included. The van der Waalsinteraction energy that acts between each atom placed on thethree-dimensional grid points and the biopolymer can be calculatedaccording to a conventional atom-pair type calculation using anempirical potential function. As the empirical potential function, theLennard-Jones type function represented by the following equation can beused.$G_{{vdw},i} = {\sum\limits_{j}\left( {{A/r_{ij}^{12}} - {B/r_{ij}^{6}}} \right)}$

In the formula, symbol “i” is a sequential number representing a probeatom; and symbol “j” is a sequential number for the biopolymer atoms.Symbols “A” and “B” are parameters for determining the location and themagnitude of minimum potential. Symbol “r_(ij)” is an inter-atomicdistance between the i-th probe atom and the j-th atom in thebiopolymer. As the parameters “A” and “B,” the values reported by Weineret al. can be used (Weiner, S. J. Kollman, P. A., Case, D. A., Singh, U.C., Ghio, C. Alagona, G., Profeta, S., Jr. and Weiner, P., J. Am. Chem.Soc., 106, 1984, pp.765-784).

The van der Waals interaction energy between the trial compound and thebiopolymer, when the trial compound is placed on the three-dimensionalgrid points, can be calculated by the following equation:$E_{vdw} = {\sum\limits_{m = 1}^{N}G_{{vdw},m_{0}}}$

wherein “m” is the sequential number for each atom in the trialcompound; N is the total number of atoms in the trial compound; and “m₀”is the sequential number of the three-dimensional grid point that isclosest to the m-th atom.

The electrostatic interaction energy that generates between each probeatom placed on each three-dimensional grid point and the biopolymer canbe calculated by the following equation:$G_{{elc},i} = {\sum\limits_{j}{{{Kq}_{j}/ɛ}\quad r_{ij}}}$

wherein symbols “i,” “j,” and “r_(ij)” are the same as those definedabove; “q_(j)” is the atomic charge of the j-th atom in the biopolymer;“K” is a constant for converting an energy unit; and “ε” is a dielectricconstant. As the dielectric constant, a fixed value may be used, oralternatively, it is preferably to use a value depending on r_(ij), asproposed by Warshel et al. (Warshel, A., J. Phys. Chem., 83, 1979,pp.1640-1652).

The electrostatic interaction energy between the trial compound and thebiopolymer, when the trial compound is placed on the three-dimensionalgrid points, can be calculated by the following equation:$E_{elc} = {\sum\limits_{m = 1}^{N}{q_{m}G_{{elc},m_{0}}}}$

wherein m, N and m₀ are as defined above.

The term “hydrogen-bonding grid point information” means informationindicating that either hydrogen-donor or hydrogen-acceptor atom can forma hydrogen bond with the hydrogen-bonding functional group in thebiopolymer if it is placed on the three-dimensional grid point, oralternatively that both hydrogen-donor and hydrogen-acceptor atom can(or cannot) form a hydrogen bond with the hydrogen-bonding functionalgroup in the biopolymer if it is placed on the three-dimensional gridpoint. For example, the hydrogen-bonding information ofthree-dimensional grid points can be expressed by assigning number 1 fora three-dimensional grid point as a hydrogen-bond acceptor, number 2 fora three-dimensional grid point as a hydrogen-bond donor, number 3 on athree-dimensional grid having both properties, and number 0 on athree-dimensional grid having no hydrogen-bonding property.

The hydrogen-bonding grid point information can be obtained as set forthbelow. If the distance “DP” between a three-dimensional grid point “P”and a hydrogen-donating atom “D” in the biopolymer is within a range(e.g., 2.5-3.1 Å) that allows for the formation of a hydrogen bond, andif the angle ∠DHP made by the three-dimensional grid point “P”, hydrogen“H,” and atom “D” is within a range (e.g., more than 30°) that allowsfor the formation of a hydrogen bond, the three-dimensional grid pointis determined to have property of hydrogen-bond acceptor. Similarly, ifthe distance “PA” between three-dimensional grid point “P” andhydrogen-accepting atom “A” in the biopolymer is within a range thatallows for the formation of a hydrogen bond and if the angle ∠ALP whichis made by the three-dimensional grid point “P,” a lone-pair electron“L,” and the hydrogen-accepting atom “A” is within a range that allowsfor the formation of a hydrogen bond, the three-dimensional grid pointis determined to have the property of hydrogen-bond acceptor. If athree-dimensional grid point is proved to have neither the property ofhydrogen-bond donor nor hydrogen-bond acceptor, the point is assumed tohave no hydrogen-bonding property.

Hydrogen-bonding functional groups that are expected to form hydrogenbonds with the trial compound are selected from hydrogen-bondingfunctional groups present in the region of the biopolymer specified inthe step S6 (S8). If a lot of hydrogen-bonding functional groups arepresent, they can be selected depending on the degree of importance.Furthermore, one or more dummy atoms are set to each hydrogen-bondingfunctional group selected in step S8 based on the three-dimensional gridpoint information calculated in S7 (S9).

In this step, a region in which a hydrogen-bond can be formed with eachhydrogen-bonding functional group selected in step S8 (hydrogen-bondingregion) is first determined based on the hydrogen-bonding grid pointinformation calculated in step S7, and then a dummy atom is placed atthe center of the hydrogen-bonding region comprising appropriate numbersof three-dimensional grid points, e.g., 5-20. The dummy atom ispositioned within the hydrogen-bonding region and outside the van derWaals radii of other atoms. The hydrogen-bonding region is defined as agroup of three-dimensional grid points that are neighboring to eachother and have the same hydrogen-bonding properties. It should be notedthat two or more dummy atoms may sometimes be placed from a singlehydrogen-bonding functional group, or in contrast, no dummy atom may begenerated. If the number of generated dummy atoms is large, it ispreferred to reduce the number to 10 or less, more preferably 5 to 10,by selecting dummy atoms present in a bottom of the cavity of theligand-bonding region considering the importance. To each dummy atom,the same hydrogen-bond property is assigned as that of thehydrogen-bonding region to which the dummy atom belongs.

Then, a trial compound is selected (S10) and then three-dimensionalcoordinates, hydrogen-bonding category numbers, information formolecular force-field calculation, and information for conformationgeneration are read from the “ADAM” format database prepared in S1(S11). Then, correspondences are made combinatorially between the dummyatoms and the hydrogen-bonding heteroatoms in the trial compound (S12).If the number of dummy atoms is “m” and the number of hydrogen-bondingheteroatoms in the trial compound is “n” , the number of thecorrespondences (combination) represented by N(i), in which number ofhydrogen-bonds formed is “i”, is mPi×nCi wherein symbol “P” representspermutations and symbol “C” represents combinations. It is preferred togenerate all of the possible combinations of the dummy atoms and thehydrogen-binding heteroatoms in the trial compound.

Then, for all i's that satisfy the relationship 1_(min)≦i≦1_(max), thesteps S12-S39 are repeated. As a result, a total of$\sum\limits_{i = I_{\min}}^{I_{\max}}{N(i)}$

sets of correspondences of dummy atoms and hydrogen-bonding heteroatomsin the trial compound are generated. By these procedures, all possiblecombinations of the hydrogen-bonds formed between the biopolymer and thetrial compound are generated, and thereby all the binding modes betweenthe trial compound and the biopolymer can be searched systematically andefficiently. The maximum number (1_(max)) and the minimum number(1_(min)) of hydrogen-bonds that should be considered between thebiopolymer and the ligand molecule are selected. Any numbers may beselected and it is preferred to select values near the number ofhydrogen-bonds that is expected from the number of hydrogen-bonds formedbetween the biopolymer and known ligand molecules.

More specifically, one of the correspondences of the dummy atoms and thehydrogen binding heteroatoms generated in S12 is selected (S13). In thisstep, correspondences in which the hydrogen-bonding properties of thedummy atoms do not match those of the hydrogen-bonding heteroatoms inthe trial compound are not selected. Then, interatomic distances arecalculated for all pairs of dummy atoms contained in the correspondenceselected in S13 (S14) If both the number of dummy atoms and the numberof hydrogen-bonding heteroatoms in the trial compound are 1, the processjumps to step S22 skipping steps S14-S21 and then jumps to step S26skipping steps S23-S25 If either of the number of dummy atoms or thenumber of hydrogen-bonding heteroatoms in the trial compound is 1, theprocess jumps to step S22 skipping steps S14-S21.

The trial compound molecule is divided into a hydrogen-bonding part anda non-hydrogen-bonding part (S15). and rotatable bonds and theirrotation modes in the hydrogen-bonding part divided in step S15 areselected (S16). Then, conformations of the trial compound are generatedsuccessively by rotating the bonds selected in step S16 according to therotation modes inputted in the three-dimensional “ADAM” format databaseprepared in S1 (S17) For each generated conformation, subsequent stepsS18-S29 are performed.

These steps will be detailed below. For each conformation generated instep S17. interatomic distances are calculated for all pairs ofhydrogen-bonding heteroatoms in the trial compound contained in thecorrespondence selected in step S13 (S18). The interatomic distances ofhydrogen-bonding heteroatoms are compared with those of correspondeddummy atoms If the value of “F”, which is a sum of the squares of thedifferences between the dummy-atom/dummy-atom distances as calculated instep S14 and the heteroatom/heteroatom distances of the trial compoundas calculated in step 18, is outside a given range, the correspondenceis excluded as inappropriate in the conformation (S19). That is, when Fis calculated by the following equation:$F = {\sum\limits_{i = 1}^{{n{({n - 1})}}/2}\left( {r_{li} - r_{di}} \right)^{2}}$

hydrogen-bonding correspondences and conformations of the trial compoundwhich do not satisfy the following condition are excluded:k_(a) ≦ F ≦ k_(a)^(′)

wherein “n” is the number of hydrogen bonds; “r_(di)” is the i-thdistance between dummy atoms; and “r_(li)” is the i-th distance betweenhydrogen-bonding heteroatoms in the trial compound; “k_(a)” is the lowerlimit of F; and “k_(a)′” is the upper limit of F. The value of k_(a) ispreferably 0.6-0.8 and that of k_(a)′ is preferably 1.2-1.4. In thisstep, possible hydrogen-bonding schemes between the biopolymer and thetrial compounds and possible conformations of the ligand molecule can besearched efficiently and thoroughly.

Then, for the hydrogen-bonding correspondences and conformations notexcluded in step S19, the conformation of the hydrogen-bonding part ofthe trial compound is optimized by minimizing the value of F (S20). Withthis step, the conformation of the trial compound is corrected so thatthe biopolymer and the trial compound can form stable hydrogen bonds.The conformation of the hydrogen-bonding part of the trial compound canbe optimized by minimizing the value of F using the Fletcher-Powellmethod (Fletcher, R., Powell, M. J. D., Computer J., 6, 1968, p.163),treating the torsion angles of the rotatable bonds present in thehydrogen-bonding part as variables.

Then, the intramolecular energy is calculated for the hydrogen-bondingpart of the trial compound optimized in step S20, and the conformationswith intramolecular energies higher than given threshold are excluded(S21). For example, when the intramolecular energy of thehydrogen-bonding part of the trial compound is calculated by using themolecular force field of AMBER 4.0, conformations with an intramolecularenergy of more than 100 kcal/mol are preferably excluded. Then, theatomic coordinates of the trial compound is converted to the coordinatesystem of the biopolymer so that the coordinates of the hydrogen-bondingheteroatoms in the trial compound in the conformation obtained in stepS21 coincide with those of the corresponded dummy atoms (S22). TheKabsh's method of least squares can be used for this step (Kabsh, W.,Acta Cryst., A32, 1976, p.922; Kabsh W., Acta Cryst., A34, 1978, p.827).By these procedures, likely hydrogen-bonding schemes and theconformations of the hydrogen-bonding part of the trial compound can beestimated roughly at the same time.

Following the above step, the intermolecular interaction energy betweenthe biopolymer and the hydrogen-bonding part of the trial compound (thesum of a van der Waals interaction energy and an electrostaticinteraction energy) and the intramolecular energy of thehydrogen-bonding part of the trial compound are calculated (S23). Theintermolecular interaction energy between the biopolymer and thehydrogen-bonding part of the trial compound E_(inter) can be calculatedby the following equation:$E_{inter} = {\sum\limits_{k}\left\lbrack {{G_{vdw}\left( k_{0} \right)} + {{G_{elc}\left( k_{0} \right)} \cdot q_{k}}} \right\rbrack}$

wherein “k” is a sequential number for the atoms in the trial compound;“G_(vdw)(k₀)” and “G_(elc)(k₀)” are the van der Waals and theelectrostatic interaction energies of the grid point information at thegrid point at k₀ closest to the atom “k”; “qk” is the atomic charge ofthe atom k.

The intramolecular energy of the hydrogen-bonding part of the trialcompound E_(intra), can be calculated by known methods. For example,E_(intra) can be calculated by the following equation using a reportedforce field such as AMBER4.0:$E_{intra} = {{\sum\limits_{dihedrals}{\sum\limits_{n}{\frac{V_{n}}{2}\left\lbrack {1 + {\cos \left( {{n\quad \Phi} - \gamma} \right)}} \right\rbrack}}} + {\sum\limits_{i < j}\left( {{A_{ij}/R_{ij}^{12}} - {B_{ij}R_{ij}^{6}}} \right)} + {\sum\limits_{i < j}\left( {q_{i}{q_{j}/ɛ}\quad R_{ij}} \right)}}$

wherein “V_(n)” is a constant provided for force-field atom types offour atoms constituting a torsion angle; “n” is a constant representingthe symmetry of the torsion-angle potential; Φ is a torsion angle, γ isthe phase of the torsion-angle potential (provided for force-field atomtypes of four atoms constituting a torsion angle); “A_(ij)” and “B_(ij)”are constants for a pair of force-field atom types of the i-th and j-thatoms in the trial compound; “R_(ij)” is the distance between the i-thand j-th atoms; q_(i) is the atomic charge on the i-th atom in the trialcompound; q_(j) is the atomic charge on the j-th atom in the trialcompound, and ε is a dielectric constant.

Conformations of the trial compound are excluded if the sum of theintermolecular interaction energy and intramolecular energy ascalculated in step S23 is higher than a given threshold (S24). Forexample, conformations in which the sum of these energies is higher than10 kcal/mol per 100 dalton of molecular weight can be excluded. Then,the structure of the hydrogen-bonding part of the trial compound isoptimized if it was not excluded in step S24 (S25). The structures ofthe hydrogen-bonding part of the trial compound can be optimized byoptimizing the torsion angles of the hydrogen-bonding part of the trialcompound as well as the relative location and orientation of the trialcompound.

For example, the structure of the hydrogen-bonding part of the trialcompound can be optimized by the Powell method (Press, W. H., Flannery,B. P., Teukolsky, S. A., Vitterling, W. T., “Numerical Recipes in C”,Cambridge University Press, Cambridge, 1989) or the like, so that thetotal energy E_(total) calculated by the following formula is minimized:

E_(total)=E_(inter)+E_(intra)+W_(hb)·N_(hb)·C_(hb)

wherein E_(inter) and E_(intra) are the same as those defined above;“W_(hb)” is a weight; “N_(hb)” is the number of hydrogen-bonds; and“C_(hb)” is a constant energy value assigned for one hydrogen-bond(e.g., 2.5 kcal/mol).

Then, conformations of the trial compounds are successively generated byrotating the rotatable bonds in the non-hydrogen-bonding part of thetrial compound according to the rotation mode inputted in the “ADAM”format three-dimensional database prepared in S1 (S26), and stepsS27-S30 are repeated for each generated conformation. Specifically, theintermolecular interaction energy between the biopolymer and the trialcompound and the intramolecular energy of the trial compound arecalculated (S27), and conformations of the trial compound are excludedif the sum of the intermolecular interaction energy and intramolecularenergy as calculated in S27 is higher than a given threshold (S28). Theentire structure of the trial compound having conformations not excludedin step S28 are then optimized (S29), and the energies are calculatedfor docking structures of the biopolymer and the trial compound obtainedby S29, and the most stable docking structure with the minimum energy isselected (S30).

The intermolecular interaction energy and intramolecular energy can becalculated in the same manner as S23, provided that energies should becalculated for the whole structure of the trial compound in S27, whereasenergies are calculated only for the hydrogen-bonding part of the trialcompound in S23. As the upper limit for the sum of these energies, forexample 10 kcal/mol per 100 dalton of molecular weight can be used.Through this step, stable docking structures comprising the biopolymerand the trial compound as well as active conformations of the trialcompound can be obtained. The optimization of the structures of thetrial compound can be carried out in the same manner as S25, providedthat the optimization should be performed for the whole structure of thetrial compound in S29, whereas the optimization is directed only to thehydrogen-bonding part of the trial compound in S25.

The docking process of the biopolymer and the trial compound asdescribed above can be carried out by using the automated dockingprogram “ADAM” (PCT International Publication WO93/20525; Yamada, M. andItai, A., Chem. Pharm. Bull., 41, 1993, p.1200; Yamada, M. and Itai, A.,Chem. Pharm. Bull., 41, 1993, p.1203; Yamada Mizutani, M., Tomioka, N.,and Itai, A., J. Mol. Biol., 243, 1994, pp.310-326).

Whether or not the trial compound should be selected as aligand-candidate compound is decided based on all or part of the givencriteria including the intermolecular interaction energy between thebiopolymer and the trial compound, the intramolecular energy of thetrial compound, number of hydrogen bonds, number of rings and others,which is evaluated in the most stable docking structure obtained in S30(S31). Examples of the intermolecular interaction energy includeelectrostatic interaction energy, van der Waals interaction, hydrogenbond energy, and sum of them. For example, a trial compound with sum ofthe electrostatic and van der Waals interaction energies below −2kcal/mol per molecular weight of 100 dalton can be selected as aligand-candidate compound. Then, judgement is performed whether or notall of the trial compounds have been subjected to the selection (S32),and if not all of the trial compounds have been subjected to theselection, the steps of S10-S31 are repeated by returning to the stepS10, and if the selection has been completed, the process proceeds tothe step of S33.

More promising ligand-candidate compounds are further selected from theligand-candidate compounds selected in S31 (S33). Prior to the furtherselection, it is convenient to decide another criteria for the furtherselection by referring to a list of information about theligand-candidate compounds in their most stable docking structures,e.g., the intermolecular interaction energy between the biopolymer andthe trial compound, the intramolecular energy of the trial compound, thenumber of hydrogen bonds, and the number of rings and others. Forexample, a further selection is preferably carried out using more severecriteria comprised of the number of hydrogen bonds that are formedbetween the biopolymer and the ligand-candidate compound, theintermolecular interaction energy between the biopolymer and theligand-candidate compound, and the number of atoms.

Examples of the intermolecular interaction energy include electrostaticinteraction energy, van der Waals interaction, and sum of them. Forexample, trial compounds with the sum of the electrostatic and van derWaals interaction energies below −8 kcal/mol per molecular weight of 100dalton are preferably selected as ligand-candidate compounds. Althougharbitrary number can be specified as a criterion of the number ofhydrogen bonds depending on the sort of the biopolymer, it is desirableto apply a number of, for example, 2 or more, more preferably 3 or moreas the criterion. Criterion about the number of atoms may also bearbitrary, but the number of, for example, 20 or more, preferably in therange of 20 to 40 can be specified as the criterion.

If too many dummy atoms are generated from the biopolymer, or if thetrial compound has rather high conformational flexibility or has a lotof heteroatoms, according to a more preferred embodiment of the presentinvention, information about the trial compound is inputted according tothe step S9, and then steps S13-S25 are carried out for a partialstructure of the trial compound and preserve information aboutcorrespondences of dummy atoms and hydrogen-bonding heteroatoms of thetrial compound that provide impossible hydrogen-bonding schemes andhydrogen-bonding heteroatoms that cannot be corresponded with any dummyatom in the partial structure. Although the partial structure of thetrial compound can be arbitrarily specified without any structuralrestrictions, a partial structure containing at least threehydrogen-bonding functional groups is preferred. Presence or absence ofconformational flexibility in the partial structure may be disregarded.

Subsequent to the above step, the steps of S13-S33 are performed for thewhole structure of the trial compound, excluding, from the set ofcorrespondences prepared in the step S12, the correspondences containingthe hydrogen-bonding heteroatoms and the correspondences of dummy atomsand hydrogen-bonding heteroatoms preserved in the previous step. As aresult, the number of the correspondences and conformations to beexamined can be reduced, thereby time for structural searching forstable docking structure of the biopolymer and the trial compound can begreatly shortened (this method is sometimes referred to as the“Pre-Pruning method (PP method)”). The PP method is based on theassumption that any hydrogen-bonding schemes and ligand conformationsthat are impossible in the partial structure of the trial compound arealso impossible in the whole structure of the trial compound. Byapplying the PP method, the time for the search can be remarkablyshortened without affecting the accuracy and reliability of the resulteddocking structure of the biopolymer and the trial compound.

EXAMPLE

In order to verify the effectiveness of the method of the presentinvention, calculations were carried out using a dihydrofolate reductaseas the target biopolymer as to whether or not the substrates or knowninhibitors, or analogues thereof can be selected as ligand-candidatecompounds.

As the atomic coordinates of the dihydrofolate reductase (hereinafterreferred to as “DHFR”), the values obtained from the atomic coordinatesof the binary complex of E. coli DHFR and methotrexate available fromthe Protein Data Back entry 4DFR were used. The atomic coordinates of acofactor NADP⁺ molecule were added to the atomic coordinates of thebinary complex on the basis of the crystal structure of the ternarycomplex of DHFR, folic acid, and NADP⁺. All water molecules wereeliminated from the atomic coordinates of the binary complex, except forthe two water molecules that strongly bind to DHFR and cannot bechemically removed. One of the two water molecules not eliminated bindsto tryptophan 21 and aspartic acid 26 of DHFR through hydrogen-bonds,and the other binds to leucine 114 and threonine 116.

As a database of trial compounds, the Available Chemicals Directory(ACD-3D, MDL) containing about 110,000 molecule data was used. Among thecompounds stored in the database, the following molecules were excluded:molecules with 5 or less atoms; molecules consist of only H, C, Si andhalogen atoms; molecules containing atoms other than H, C, N, O, S, Pand halogen atoms; molecules with 6 or more rotatable bonds; andmolecules having no ring structure, and 10,017 compounds were subjectedto the search. The data were supplemented with hydrogen atoms, andforce-field atom-type numbers, hydrogen-bonding category numbers, atomiccharge, and bond-rotation modes (rotation with a rotation angle of 120°)were assigned. The criteria for the selection of ligand-candidatecompounds were as follows: the intermolecular interaction energy betweenthe target biopolymer (the enzyme) of −7 kcal/mol or less per molecularweight of 100 dalton, 3 or more intermolecular hydrogen bonds, and 25 ormore number of atoms.

According to the method of the present invention, about 1400 trialcompounds were finally selected as ligand-candidate compounds, and theprocess required about 80 hours from inputting of three-dimensionalcoordinates of the biopolymer until the completion of the selection ofthe ligand-candidate compounds. These ligand-candidate compoundsincluded the compounds shown in FIG. 3, together with other numerousanalogues to substrates and known inhibitors. An inhibitor,methotrexate, and the substrate, dihydrofolate, were not selected fromthis search because of the limitation about the number of rotatablebonds, while trimethoprim was selected as a ligand candidate compound.The binding modes and the conformations of the ligand candidatecompounds selected by the method of the present invention agreed wellwith the binding modes and the conformations deduced from the crystalstructures of the complexes comprising the substrate or the inhibitoranalogue together with the enzyme.

According to the method of the present invention, numerousligand-candidate compounds with novel structures quite dissimilar to thesubstrate and the known inhibitors were selected. A part of examplesthereof are shown in FIG. 4 together with the estimated dockingstructures. In the figure, the bold line indicates the molecularskeletons of the selected ligand-candidates , the dotted line indicateshydrogen bonds, the fine solid line indicates a part of the biopolymer,and the cage depicted with the broken line indicates regions allowed forthe centers of the atoms of the ligand. From these docking structures,it was found that that several hydrogen bonds were formed and highcomplementarities with the enzyme cavity were achieved by selectedligand-candidates. Therefore, these ligand candidate compounds arepromising lead compounds as inhibitors against dihydrofolate reductase.

Industrial Applicability

According to the method of the present invention, ligand compounds tothe target biopolymer whose structure is already known, can beidentified from a three-dimensional structure database in a short time,while considering conformational flexibility, thereby extremely reliablesearching results can be easily obtained. The method of the presentinvention is thus useful in the field of medical and biological sciencesfor the creation of lead compounds having activities as inhibitors,agonists, or antagonist to biopolymers.

What is claimed is:
 1. A method of selecting at least oneligand-candidate compound suitable for binding to a target biopolymer,from a compound database comprising information of three-dimensionalcoordinates of at least two database entries in arbitrary conformationcomprising: (a) estimating a most stable docking structure between thetarget biopolymer and each of at least two entries as a trial compound,wherein the estimating comprises performing an automated search; and (b)determining whether the trial compound is a candidate based on the moststable docking structure.
 2. The method of claim 1, wherein the compounddatabase comprises at least 10,000 compounds.
 3. The method of claim 1,wherein the compound database comprises information for calculatinginteraction energy and for generating conformations.
 4. The method ofclaim 1, wherein the estimating comprises consideration of all possiblebinding modes and conformations of the trial compound to obtain the moststable docking structure.
 5. The method of claim 4, wherein theestimating further comprises at least one energy minimization process toobtain the most stable docking structure with optimized binding modesand torsion angles.
 6. The method of claim 5, wherein the estimatingcomprises at least three energy minimization processes, eachminimization process comprising repeated processes.
 7. The method ofclaim 5, wherein the estimating comprises an automatic docking method.8. The method of claim 7, wherein the automatic docking method is the“ADAM” process.
 9. The method of claim 1, wherein at least oneligand-candidate compound is chosen based on the most stable dockingstructure.
 10. The method of claim 9, wherein at least oneligand-candidate compound is chosen based on stability and structuralfeatures of the most stable docking structure.
 11. The method of claim10, wherein the features comprise a number of hydrogen bonding andinteraction energies between the target biopolymer and the trialcompound.
 12. The method of claim 11, further comprising at least oneadditional choosing process.
 13. A compound database stored on acomputer readable medium for searching at least one ligand-candidatecompound for a biopolymer, the database comprising three-dimensionalcoordinates of at least two database entries, atomic coordinates ofhydrogen atoms, force field atom-type, atomic charge of atoms, hydrogenbonding category number of heteroatoms, rotatable bonds, and bondrotation schemes for the rotatable bonds, wherein the database issearchable to find at least one ligand-candidate compound.
 14. Themethod of claim 13, wherein the compound database comprises at least10,000 compounds.
 15. The method of claim 13, wherein the databasefurther comprises an automatic docking method.
 16. The method of claim15, wherein the automatic docking method is the “ADAM” process.
 17. Amethod of selecting at least one ligand-candidate compound suitable forbinding to a target biopolymer from a compound database comprisinginformation of three-dimensional coordinates of at least two databaseentries in arbitrary conformation comprising: (a) assigninghydrogen-bonding category numbers, calculating force-field energies, andgenerating conformations for at least two database entries; (b)preparing physicochemical information about a ligand-binding region andat least one dummy atom of the target biopolymer; (c) estimating themost stable docking structure between the target biopolymer and each ofat least two database entries, wherein the estimating comprisesperforming an automated search; (d) determining whether the trialcompound is a candidate based on the most stable docking structure. 18.The method of claim 17, wherein the compound database comprises at least10,000 compounds.
 19. The method of claim 17, wherein the compounddatabase comprises information for calculating interaction energy andfor generating conformations.
 20. The method of claim 17, furthercomprising choosing ligand-candidate compounds chosen in step (d)comprising at least one additional choosing process.
 21. The method ofclaim 20, wherein said at least one additional choosing processcomprises calculating van der Waals and electrostatic interactions tochoose a most stable docking structure.
 22. The method of claim 17,wherein the estimating further comprises preparing combinatorialcorrespondences between at least one hydrogen-bonding heteroatom of thetrial compound and at least one dummy atom of the target biopolymer. 23.The method of claim 22, wherein the estimating comprises preserving theimpossible combinatorial correspondences between at least onehydrogen-bonding heteroatom of the trial compound and at least one dummyatom of the target biopolymer.
 24. The method of claim 23, wherein theestimating comprises simultaneously estimating hydrogen-binding modesbetween the trial compound and target biopolymer.
 25. The method ofclaim 24, wherein the estimating comprises obtaining a docking structurebetween the trial compound and target biopolymer.
 26. A method ofselecting at least one ligand-candidate compound suitable for binding toa target biopolymer from a compound database comprising information ofthree-dimensional coordinates of at least two database entries inarbitrary conformation estimating the most stable docking structurebetween the target biopolymer and each of at least two entries as atrial compound wherein the estimating comprises consideration of thepossible binding modes and conformations of the trial compound.
 27. Themethod of claim 26, wherein the estimating comprises an automaticdocking method.
 28. The method of claim 27, wherein the automaticdocking method is the “ADAM” process.
 29. The method of claim 26,wherein the compound database comprises at least 10,000 compounds. 30.The method of claim 26, wherein the compound database comprisesinformation for calculating interaction energy and for generatingconformations.
 31. A method of selecting at least one ligand-candidatecompound suitable for binding to a target biopolymer from a compounddatabase comprising estimating the most stable docking structure betweenthe target biopolymer and each of at least two entries as a trialcompound by an automatic docking method; wherein the estimatingcomprises consideration of the possible binding modes and conformationsof the trial compound; and wherein the compound database comprisesinformation of three-dimensional coordinates of at least two databaseentries in arbitrary conformation.
 32. The method of claim 31, whereinthe automatic docking method is the “ADAM” process.
 33. The method ofclaim 31, wherein the compound database comprises at least 10,000compounds.
 34. The method of claim 31, wherein the compound databasecomprises information for calculating interaction energy and forgenerating conformations.
 35. A method of selecting at least oneligand-candidate compound suitable for binding to a target biopolymerfrom a compound database comprising estimating the most stable dockingstructure between the target biopolymer and each of at least two entriesas a trial compound by an automatic docking method; wherein theestimating comprises consideration of the possible binding modes andconformations of the trial compound based upon the interaction betweenhydrogen-bonding heteroatoms in the compound and hydrogen-bonding dummyatoms in the target biopolymer; and wherein the compound databasecomprises information of three-dimensional coordinates of at least twodatabase entries in arbitrary conformation.
 36. The method of claim 35,wherein the automatic docking method is the “ADAM” process.
 37. Themethod of claim 35, wherein the compound database comprises at least10,000 compounds.
 38. The method of claim 35, wherein the compounddatabase comprises information for calculating interaction energy andfor generating conformations.
 39. A method of selecting at least oneligand-candidate compound suitable for binding to a target biopolymerfrom a compound database comprising estimating the most stable dockingstructure between the target biopolymer and each of at least two entriesas a trial compound; wherein the estimating comprises calculation ofintermolecular energy, intramolecular energy, or the number of hydrogenbonds; and wherein the compound database comprises information ofthree-dimensional coordinates of at least two database entries inarbitrary conformation.
 40. The method of claim 39, wherein the compounddatabase comprises at least 10,000 compounds.
 41. The method of claim39, wherein the compound database comprises information for calculatinginteraction energy and for generating conformations.
 42. A compounddatabase stored on a computer readable medium for searching at least oneligand-candidate compound for a biopolymer comprising information forcalculating interaction energy and information for generatingconformations.
 43. The database of claim 42, wherein the compounddatabase comprises at least 10,000 compounds.